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Abstract. We analyze the so-called ppz algorithm for (d, fc)-CSP prob- 
lems for general values of d (number of values a variable can take) and 
k (number of literals per constraint). To analyze its success probability, 
we prove a correlation inequality for submodular functions. 

1 Introduction 

Consider the following extremely simple randomized algorithm for fc-SAT: Pick 
a variable uniformly at random and call it x. If the formula F contains the 
unit clause (x), set a; to 1. If it contains (x), set it to 0. It if contains neither, 
set x uniformly at random (and if it contains both unit clauses, give up). This 
algorithm has been proposed and analyzed by Paturi, Pudlak, and Zane [4] an d 
is called ppz. 

The idea behind analyzing its success probability can be illustrated nicely 
if we assume, for the moment, that F has a unique satisfying assignment a 
setting all variables to 1. Switching a variable it from 1 to makes the formula 
unsatisfied. Therefore, there is a clause C x = [x VyiV- • -Vyk-i)- With probability 
1/fc, the algorithm picks and sets y\, . . . ,yk-i before picking x. Supposed they 
•tjj have been set correctly (i.e., to 1), the clause C x is now reduced to (x), 
and therefore x is also set correctly. Intuitively, this shows that on average, the 
algorithm has to guess (1 — \/k)n variables correctly and can infer the correct 
values of the remaining n/k variables. This increases the success probability of 
the algorithm from 2~" (simple stupid guessing) to 2~"( 1 ~ 1 / fc ). 

In this paper we generalize the sketched algorithm to general constraint satis- 
faction problems, short CSPs. These are a generalization of boolean satisfiability 
to problems involving more than two truth values. A set of n variables xi, . . . , x n 
is given, each of which can take a value from [d] := {1, . . . , d}. Each assignment 
to the n variables can be represented as an element of [d] n . A literal is an ex- 
pression of the form (a;, ^ c) for some c G [d]. A CSP formula consists of a 
conjunction (AND) of constraints, where a constraint is a disjunction (OR) of 
literals. We speak of (d, fc)-CSP formula if each constraint consists of at most 
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k literals. Finally, (d, fc)-CSP is the problem of deciding whether a given (d, k)- 
CSP formula has a satisfying assignment. Note that (2, fc)-CSP is the same as 
fc-SAT. Also (d, fc)-CSP is well-known to be NP-complete, unless d = 1, k = 1, or 
d = k = 2. We can manipulate a CSP formula F by permanently substituting a 
value c for a variable x. This means we remove all satisfied constraints, i.e., those 
containing a literal (x ^ c') for some c' ^ c, and from the remaining constraints 
remove the literal (x ^ c), if present. We denote the resulting formula by F^^"^. 

It is obvious how to generalize the algorithm to (d, fc)-CSP problems. Again 
we process the variables in a random order. When picking x, we collect all unit 
constraints of the form (x ^ c) and call the value c forbidden. Values in [d] 
which are not forbidden are called allowed, and we set x to a value that we 
choose uniformly at random from all allowed values. How can one analyze the 
success probability? Let us demonstrate this for d — k = 3. Suppose F has 
exactly one satisfying assignment a = (1,...,1). Since changing the value of a 
variable x from 1 to 2 or to 3 makes F unsatisfied, we find critical constraints 

(i^2Vi//1Vz/1) 
(j;^3Vm^1Vd/1) 

If all variables y,z,u,v are picked before x, then there is only one allowed value 
for x left, namely 1, and with probability 1, the algorithm picks the correct 
values. If y, z come before x, but at least one of u or v come after x, then it is 
possible that the values 1 and 3 are allowed, and the algorithm picks the correct 
value with probability 1/2. In theory, we could list all possible cases and compute 
their probability. But here comes the difficulty: The probability of all variables 
y, z, u, v being picked before x depends on whether these variables are distinct! 
Maybe y — u, or z — v... For general d and fc, we get d — 1 critical constraints 

C 2 := (x^2VyfVlV---Vy[ 2) 1 ^1) 
C 3 := (z^3Vyf ^lV-Vyf^l) 

(1) 

We are interested in the distribution of the number of allowed values for x. 
However, the above constraints can intersect in complicated ways, since we have 
no guarantee that the variables yj are distinct. Our main technical contribution 

(c) 

is a sort of correlation lemma showing that in the worst case, the y^ are indeed 
distinct, and therefore we can focus on that case, which we are able to analyze. 

Previous Work 

Feder and Motwani [T] were the first to generalize the ppz-algorithm to CSP 
problems. In their paper, they consider (<i, 2)-CSP problem, i.e., each variable 
can take on d values, and every constraint has at most two literals. In this 
case, the clauses C2, . . . , Cd cannot form complex patterns. Feder and Motwani 
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show that the worst case happens if (i) the variables y{ , . . . ,y\ are pairwise 
distinct and (ii) the CSP formula has a unique satisfying assignment. However, 
their proofs do not directly generalize to higher values of k. 

Recently, Li, Li, Liu, and Xu [2] analyzed ppz for general CSP problems (i.e., 
d, k > 3). Their analysis is overly pessimistic, though, since they distinguish only 
the following two cases, for each variable x: When ppz processes x, then either 
(i) all d values are allowed, or (ii) at least one value is forbidden. In case (ii) , ppz 
chooses one value randomly from at most d — 1 values. Since case (ii) happens 
with some reasonable probability, this gives a better success probability than the 
trivial d~ n . However, the authors ignore the case that two, three, or more values 
are forbidden and lump it together with case (ii). Therefore, their analysis does 
not capture the full power of ppz. 

Our Contribution 

Our contribution is to show that "everything works as expected", i.e., that in 
the worst case all variables y^ in ([1]) are distinct and the formula has a unique 
satisfying assignment. For this case, we can compute (or at least, bound from 
below) the success probability of the algorithm. 

Theorem 1.1. For d, k > 1, define 

G(d,k) := X>Sa(l + i)(^ j\l-r k - l y{r k - l ) d - l -idr . 

Then there is a randomized algorithm running in polynomial time which, given a 
(d, k)- CSP formula over n variables, returns a satisfying assignment with prob- 
ability at least 2- nG ^ k \ 

The algorithm we analyze in this paper is not novel. It is a straightforward 
generalization of the ppz algorithm to CSP problems with more than two truth 
values. However, its analysis is significantly more difficult than for d — 2 (and 
also more difficult than for large d and k — 2, the case Feder and Motwani [T] 
investigated). 

Comparison 

We compare the success probability of Schoning's random walk algorithm with 
that of ppz. For ppz, we state the bound given by Li, Li, Liu, and Xu [2] and by 
this paper. All bounds are approximate and ignore polynomial factors. 



(d,k) 


Schoning [5 j 


Li, Li, Liu, and Xu [2] 


this paper 


(2,3) 


1.334-™ 


1.588-" 


1.588-™ 


(3,3) 




2.62-™ 


2.077-™ 


(5,4) 


3.75"™ 


4.73 


3.672-™ 


(6,4) 


4.5"" 


5.73-" 


4.33-™ 
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For small values of d, in particular for the boolean case d = 2, Schoning's 
random walk algorithm is much faster than ppz, but ppz overtakes Schoning 
already for moderately large values of d and thus is, to our knowledge, the 
currently fastest algorithm for (d, fc)-CSP. 

2 The Algorithm 

The algorithm itself is simple. It processes the variables xi,...,x n according 
to some random permutation tt. When the algorithm processes the variable x, 
it collects all unit constraints of the form (x ^ c) and calls c forbidden. A 
truth value c that is not forbidden is called allowed. If the formula is satishable 
when the algorithm processes x, there is obviously at least one allowed value. 
The algorithm chooses uniformly at random an allowed value c and sets x to 
c, reducing the formula. Then it proceeds to the next variable. For technical 
reasons, we think of the permutation tt as part of the input to the algorithm, 
and sampling tt uniformly at random from all nl permutations before calling 
the algorithm. The algorithm is described formally in Algorithm Q] To analyze 



Algorithm 1 ppz(F: a (d, fc)-CSP formula over variables V := {xi, . . . , x n }, tt: 
a permutation of V) 



1 


a := the empty assignment 


2 


for i = 1, . . . , n do 


3 




4 


S(x,ir)% {eg [d] | {x^c)gF} 


5 


if S(x,tv) = then 


6 


return failure 


7 


end if 


8 


b <— u.a.r. S(x, tt) 


9 


tt:=ttU[lH>ii] 


10 


p .— plx^b] 


11 


end for 


12 


if a satisfies F then 


13 


return a 


14 


else 


15 


return failure 


16 


end if 



the success probability of the algorithm, we can assume that F is satishable, 
i.e. the set sat(F) of satisfying assignments is nonempty. This is because if F 
is unsatisfiable, the algorithm always correctly returns failure. For a fixed 
satisfying assignment, we will bound the probability 



Pr[ppz(F, tt) returns a] , 



(2) 
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where the probability is over the choice of tt and over the randomness used by 
ppz. The overall success probability is given by 

Pr[ppz(F, 71") is successful] = Pr[ppz(.F, tt) returns a] . (3) 

aGsatv-(-F) 

In the next section, we will bound ((2]) from below. The bound depends on the 
level of isolatedness of a: If a has many satisfying neighbors, its probability to be 
returned by ppz decreases. However, the existence of many satisfying assignments 
will in turn increase the sum in In the end, it turns out that the worst case 
happens if F has a unique satisfying assignment. Observe that for the ppz- 
algorithm in the boolean case [4], the unique satisfiable case is also the worst 
case, whereas for the improved version ppsz [3], it is not, or at least not known 
to be. 



3 Analyzing the Success Probability 
3.1 Preliminaries 

In this section, fix a satisfying assignment a. For simplicity, assume that a = 
(1, . . . , 1), i.e. it sets every variable to 1. What is the probability that ppz returns 
a? For a permutation tt and a variable x, let /3 be the partial truth assignment 
obtained by restricting a to the variables that come before x in tt, and define 

S(x,TT,a) :={ce [d] | O^c) ^ F^} . 

In words, we process the variables according to tt and set them according to a, 
but stop before processing x. We check which truth values are not forbidden for 
a; by a unit constraint, and collect theses truth values in the set S(x,TT,a). Let 
us give an example: 

Example. Let d = 3, k = 2, and a — (1, . . . , 1). We consider 

F = (^2V^l)A(i^3Az/l) . 

For tt — (x,y,z), no value is forbidden when processing x, thus S(x, 7r,a) = 
{1,2,3}. For tt' — (y,x,z), then we consider the partial assignment that sets y 
to 1, obtaining 

Fb" 1 ! = (^2)A(i^3Vz/l) i 

and S(x,Tr',a) — {1,3}. Last, for tt" = (y,z,x), then we set y and z to 1, 
obtaining 

F [y^l.z^l] = (2; ^ 2) A (X ^ 3) , 

thus S(x,7r",a) = {1}. □ 



Observe that S(x,tt, a) is non-empty, since a(x) 6 S(x,TT,a), i.e. the value 
a assigns to x is always allowed. What has to happen in order for the algorithm 
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to return a? In every step of ppz, the value b selected in Line [8] for variable x 
must be a(x). Assume now that this was the case in each of the first i steps of 
the algorithm, i.e., the variables a^m, . . . , x^u-) have been set to their respective 
values under a. Let x — x n u + \\ be the variable processed in step i + 1. The 
set S(x,ir,a) coincides with the set S(x,tt) of the algorithm, and therefore x is 
set to a{x) with probability l/\S(x, tt, a)\. Since this holds in every step of the 
algorithm, we conclude that for a fixed permutation n, 

Pr[ppz(i 7 ', 7r) returns a] = J j 



xEV 



\S(x,TT,a)\ 



For 7r being chosen uniformly at random, we obtain 



Pr[ppz(i 7 ', 7r) returns a] — ~E„ 



n 

Lxev 



1 



\S(x,ir,a)\ 



The expectation of a product is an uncomfortable term if the factors are not 
independent. The usual trick in this context is to apply Jensen's inequality, 
hoping that we do not lose too much. 

Lemma 3.1 (Jensen's Inequality). Let X be a random variable and / : R — > 
M a convex function. Then E[f(X)] > f(E[X]), provided both expectations exist. 

We apply Jensen's inequality with the convex function being / : x i— > 2~ x 
and the random variable being X = X^gy 1°§2 \^{x, 7r, a)\. With this notation, 
f(X) = YYxev \ s(x]-k a) \ ' ^ ne expectation of which we want to bound from below. 



E 



n 

.xGV 



1 



\S{x,ir,a)\ 



E 



2" T.o-.ev lo S2 \S(x,n,a) 



> 2 E ^~ S, £ y lo S2 \S(x,iT,a) 
= 2 _ S.ev E l lo &2 \S(x,n,a) 



(4) 



Proposition 3.2. Pr[ppz(^,7r) returns a] > T ^°>ev E ^s 2 \S(x,K,a)\] ^ 

Example: The boolean case. In the boolean case, the set S(x, it, a) is either 
{1} or {0,1}, and thus the logarithm is either or 1. Therefore, the term 
-E[log 2 \ S(x, 7r, a) |] is the probability that the value of x is not determined by a 
unit clause, and thus has to be guessed. 

So far the calculations are exactly as in the boolean ppz. This will not stay 
that way for long. In the boolean case, there are only two cases: Either the value 
of x is determined by a unit clause (in which we call x forced) , or it is not. For 
d > 3, there are more cases: The set of potential values for x can be the full range 
[d], it can be just the singleton {1}, but it can also be anything in between, and 
even if the algorithm cannot determine the value of x by looking at unit clauses, 
it will still be happy if at least, say, d/2 values are forbidden by unit clauses. 
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3 . 2 Analyzing E [log 2 | S (x , tt , en) | ] 

In this section we prove an upper bound on _E[log 2 \S(x, n, a)\]. We assume 
without loss of generality that a = (1, . . . , 1). There are d truth assignments 
a%, . . . , ay agreeing with a on the variables V \ {x}: For a value c £ [d] we define 
a c := a [a; i— > c], i.e., we change the value it assignment to x to c, but keep 
all other variables fixed. Clearly, ot\ — a. The number of assignments among 
aii,..., ad that satisfy F is called the looseness of a at x, denoted by 

Since ai = a satisfies F, the looseness of a at a; is at least 1, and since there 
are d possible values for x, the looseness is at most d. Thus 1 < £(a,x) < d. 
If a is the unique satisfying assignment, then £(a, x) ~ 1 for every x. Note 
that a being unique is sufficient, but not necessary: Suppose a = (1, . . . , 1) and 
a' = (2, 2, 1, 1, . . . , 1) are the only two satisfying assignments. Then £(a,x) = 
£(a',x) — 1 for every variable x. 

Why are we considering the looseness £ of a at xl Suppose without loss of 
generality that the assignments ai, . . . ,ag satisfy F, whereas a^+i, . . . ,aa do 
not. The set S(x, ir, a) is a random object depending on n, but one thing is sure: 

for all c = 1, . . . , £(a, x) : c £ S(x, ir, a) . 

For £(a,x) < c < d, what is the probability that c £ S{x, / n,a)l Since a c does 
not satisfy F, there must be a constraint in F that is satisfied by a but not by 
a c . Since a and a c disagree on x only, that constraint must be of the following 
form: 

(x ^ c V y 2 + 1 V j/3 + 1 V • • • V y k ? 1) . (5) 

For some k — 1 variables y% , . . . , yk ■ We do not rule out constraints with fewer 
than k — 1 literals, but we capture this by not insisting on the yj in ([5]) being 
distinct. In any case, if the variables j/2, • ■ • , J/fc come before x in the permutation 
7r, then c ^ S(x, n, a): This is because after setting to 1 the variables that come 
before x, the constraint in §5§ has been reduced to (x ^ c). Note that j/2, • ■ • , J/fc 
coming before x is sufficient for c ^ S(x,Tt, a), but not necessary, since there 
could be multiple constraints of the form ([5]). With probability at least 1/k, all 
variables y%, ■ ■ ■ , Vk come before x, and we conclude: 

Proposition 3.3. If a c does not satisfy F, then Pr[c £ S(x,c,a)] < 1 — 1/k. 

This proposition is nice, but not yet useful on its own. We can use it to finish the 
analysis of the running time, however we will end up with a suboptimal estimate. 

3.3 A suboptimal analysis of ppz 

The function t h-> log 2 (i) is concave. We apply Jensen's inequality to conclude 
that 

F,[log 2 \S(x,7r,a)\]<log 2 (E[|S0r,7r,a)|]) = log 2 ^Pr[c e tt, a)]J (6) 
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We apply what we have learned above: For c = 1, . . . ,£(a, x), it always holds 
that c g S(x, ir, a), and for c = l{a, x) + 1, . . . , d, we have computed that Pr[c e 
S(a;, 7T, a)] < 1- 1/fc. Therefore 

E[log 2 |S(z,7r,a)|] <log 2 [£(a,x) + (d-£(a,x)) 



The unique case. If a is the unique satisfying assignment, then £(a, x) = 1 for 
every variable x in our CSP formula F, and the above term becomes 

, / {d-l)(k-l)\ , fd(k -1) + 1 
log 2 f 1 + i f 1 ) = log 2 fc j 



We plug this into the bound of Proposition [3T2J 

Prfppz returns a] > 2~ ^ B[Xo ^ 

, /<J(Js-l) + l\ 
> 2 -» lo g2( k ) 

The success probability of Schoning's algorithm for (d, fc)-CSP problems is (j^jr^-^J , 
and we see that even for the unique case, our analysis of ppz does not yield any- 
thing better than Schoning. Discouraged by this failure, we do not continue this 
suboptimal analysis for the non-unique case. 



3.4 Detour: Jensen's Inequality Here, There, and Everywhere 

The main culprit behind the poor performance of our analysis is Jensen's inequal- 
ity in ([6|). To improve our analysis, we refrain from applying Jensen's inequality 
there and instead try to analyze the term E[log 2 \ S(x, tt, a)\] directly. However, 
recall that we have used Jensen's inequality before, in (J4J) . Is it safe to apply it 
there? How can we tell when applying it makes sense and when it definitely does 
not? To discuss this issue, we restate the two applications of Jensen's inequality: 



E 



2~ E I( =v lo g 2 \S(x,7r,a)\ > 2-E[- T.^ev lo &2 \S(x,n,a)\] ^ 

Bpog 2 |S(x,7r,a)|] <log 2 (E[[5(x,7r,a)|]) (8) 



Formally, Jensen's inequality states that for a random variable X and a convex 
function /, it holds that 

E[/(X)] > f(E[X]) , (9) 

and by multiplying ^ by — 1 one obtains a similar inequality for concave func- 
tions. As a rule of thumb, Jensen's inequality is pretty tight if X is very concen- 
trated around its expectation: In the most extreme case, X is a constant, and ((9|) 
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holds with equality. On the other extreme, suppose X is a random variable tak- 
ing on values —to and to, each with probability 1/2, and let / : t h- > t 2 , which is a 
convex function. The left-hand side of (j9|) evaluates to E[/(X)] = ELY 2 ] = to 2 , 
whereas the right-hand side evaluates to /(ELY]) = /(0) = 0, and Jensen's in- 
equality is very loose indeed. What random variables are we dealing with in (J7J 
and flU? These are 

X := log 2 \S(x,TT,a)\ and 
y := \S(x,ir,a)\ , 

and the corresponding functions are / : t M> 2 , which is convex, and g : 
t i — y log 2 i, which is concave. In both cases, the underlying probability space is 
the set of all permutations of V, endowed with the uniform distribution. We 
see that Y is not concentrated at all: Suppose x comes first in tt: If our CSP 
formula F contains no unit constraints, then \S(x, 7T, a)\ = d, i.e., no truth value 
is forbidden by a unit constraints. On the other hand, if x comes last in tt, 
then \S(x, tt, a)\ = l{a,x). Either case happens with probability 1/n, which is 
not very small. Thus, the random variable \S(x, tt, a)\ does not seem to be very 
concentrated. 

Contrary to Y, the random variable X can be very concentrated, in fact for 
certain CSP formulas it can be a constant: Suppose d = 2, i.e., the boolean case. 
Here X simply counts the number of non-forced variables. Consider the 2-CNF 
formula 

K=i ( x i v vi) A ( x i v vi) A & v y*) ■ ( 10 ) 

This formula has n variables, and a = (1, . . . , 1) is the unique satisfying assign- 
ment. Observe that if Xi comes before yi in tt, then S(xi, tt, a) — {0,1} and 
S(yi,n,a) — {1}. If yi comes before Xi, then S(x{, tt, a) = {1} and S(yi,ir,a) — 
{0, 1}. Hence X = n/2 is a constant. Readers who balk at the idea of supplying 
a 2-CNF formula as an example for an exponential-time algorithm may try to 
generalize (ITUt for values of k > 3. 



3.5 A Better Analysis 

After this interlude on Jensen's inequality, let us try to bound E[log 2 \S(x, tt, a)\] 
directly. In this context, x is some variable, a is a satisfying assignment, for sim- 
plicity a = (1, . . . , 1), and tx is a permutation of the variables sampled uniformly 
at random. Again think of the d truth assignments a\, . . . , ad obtained by setting 
a c := a[x <— > c] for c = 1, . . . , d. Among them, I :— £{a, x) satisfy the formula F. 
We assume without loss of generality that those are a%, . . . , ae- Thus, for each 
£ < c < d, there is a constraint C c satisfied by a but not by a c . Let us write 
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down these constraints: 



C t+1 :=(i^+1V y[ t+ 11 ^ 1 V • • • V + 1) 

C e+2 (^l + 2V y[ £+2) ? 1 V • • • V ^_+ 2) 1) 

C rf := [x^d V ^ 1 V • • • V y^x / 1) 



(11) 



We define binary random variables Yj for 1 < j < k — 1 and £+l<c<das 
follows: 



(c) 

y(c) 11 if y]j comes after x in the permutation 7r 

otherwise . 
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We define := y} c) V • • • V Y^ v For convenience we also introduce random 

variables Y^\ . . . , Y^ that are constant 1. Finally, we define Y ;= Yl c =i ■ 
Observe that Y^ = if and only if all variables yf,..., y%_-y come before x in 
the permutation, in which case c S(x, 7r, a). Therefore, 

\S(x,ir,a)\<Y (12) 

The variables Y^\ . . . , Y^ are constant 1, whereas each of the Y^ c+1 \ . . . , Y^ 
is with probability at least 1/k. Since 1 < £ < d, the random variable Y can 
take values from 1 to d. We want to bound 



E[log 2 |5(x,a,7r)|] < E[log 2 (F)] =E 



log 2 [£+ Y 



(13) 



For this, we must bound the probability Pr[Y = j] for j = 1,. . . ,d. This is 
difficult, since the Y^ are not independent: For example, conditioning on x 
coming very early in n increases the expectation of each Y ^ , and conditioning 
on x coming late decreases it. We use a standard trick, also used by Paturi, 
Pudak, Saks and Zane [3] to overcome these dependencies: Instead of viewing ir 
as a permutation of V, we think of it as a function V [0, 1] where for each 
x G V, its value ir(x) is chosen uniformly at random from [0, 1]. With probability 
1, all values n(x) are distinct and therefore give rise to a permutation. The trick 
is that for x, y, and z being three distinct variables, the events "y comes before 
x" and "z comes before x" are independent when conditioning on n(x) = r: 

Pr[-7r(y) < it(x) | tt(x) = r] = r 
Pr[7r(z) < tt(x) j tt(x) = r] = r 
Pr[7r(a;) < n(x) and ir(z) < tt(x) \ n(x) = r] = r 2 



11 



Compare this to the unconditional probabilities: 

Pr[7r(y) < n(x)] 
Pr[7r(z) < 7r(x) I ir(x) = r] 
Pr[7r(a;) < it(x) and 7r(z) < 7r(x) | 7r(a;) = r] 



We want to compute E[F' C ) | tt(x) = r]. We know that E[Y^ \ ir(x) = 



= rl = 1— r, 



since Yj is 1 if and only if the boolean variable y^' comes after x. Since we 
are dealing with constraints of size at most fc, there are, for each £ + 1 < c < d, 

(c) (c) 

at most k — 1 distinct variables y\ , . . . , y k _ 1 , and the probability that all come 
before x, conditioned on tt(x) = r, is at least r k ~ 1 . Therefore 

E [y(c)] < i _ r k-i 

(c) 

Still, a variable y) might occur in several constraints among C^+i, . . . , Cd, and 
therefore the Y c are not independent. The main technical tool of our analysis 
is a lemma stating that the worst case is achieved exactly if they in fact are 
independent, i.e., if all variables j/j for c = £+ 1, . . . , a and k = 1, . . . , k — 1 are 
distinct. 

Lemma 3.4 (Independence is Worst Case). Let r, k, £ and be de- 
fined as above. Let Z^ i+1 \ . . . , be independent binary random variables with 
E[Z 4 ] = l-r k ~ l . Then 



.(<=) 



E 




■n{x) 



< E 




Before we prove the lemma in the next section, we first finish the analysis of the 
algorithm. We apply a somewhat peculiar estimate: Let a > 1 and b > be 
integers. Then log 2 (a + b) < log 2 (a • (b + 1)) = log 2 (a) + log 2 (6 + 1). Applying 

Sc=f+i and combining it with the lemma and 



this with a := £ and b := 
with (1131). we obtain 



E[log 2 |5(x,a,7r)| | n(x) =r}< log 2 (£) +E 



logs 



fi+ £ z(c) ) 

V c=l+l J. 



(14) 



This estimate looks wasteful, but consider the case where F has a unique sat- 
isfying assignment a: There, £(a, x) = 1 for every variable x, and (|14l) holds 
with equality. In addition to Z {1+1 \ . . . , Z^ d \ we introduce £ — 1 new indepen- 
dent binary random variables Z^ 2 \ . . . , Z^ e \ each with expectation 1 — r fc_1 , and 
define 



g(d,k,r) := E 



log 2 1 + J2 z 



(c) 
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The only difference between the expectation in ([Ml) an d here is that here, we 
sum over c = 2, . . . , d, whereas in (|14j) we sum only overc = £ + 1, . . . , d. We get 
the following version of (fT4l) : 



Epog 2 |5(x,a,7r)| I 7r(»=r] < log 2 (*) + 5 (d, k, r) . (15) 



We want to get rid of the condition n(x) — r. This is done by integrating f| 15[) 
for r from to 1. 

E[log 2 |S(x,a,7r)|] < log 2 (£) + f g(d, k, r)dr =: log 2 (£) + G(d, k) . (16) 

Jo 

This G(d, k) is indeed the same G(d, k) as in Theorem ll.il and below we will do 
a detailed calculation showing this. 

Lemma 3.5 (Lemma 1 in Feder, Motwani [lj). Let F be a satisfiable CSP 
formula over variable set V. Then 

aesatv(F) x<=V v ' 

This lemma is a quantitative version of the intuitive statement that if a set 
S C [cf| n is small, then there must be rather isolated points in S 1 . We now put 
everything together: 

Pr[ppsz(F, tt) is successful] = Pr[ppsz(F, 7r) returns a] 

aGsatv (F) 

> 2 -E a , e yE[log 2 |S(x,a,7r)|] ) 

where the inequality follows from (U). Together with (ITS)) , we see that 

V" 2"E a!e vE[log 2 |S(a ; ,a,7r)|] > 2 " £*ev(l°S2 Wa,»))+G(<i,fc)) 



ctGsatv(-F) 

2-nG(d,fc) 



IT £( a x \ 

aesat v (F) xeV y ' ' 



^ 2 _ nG(d : k) 



where the last inequality follows from Lemma [3.5l To prove Theorem ll.li we eval- 
uate the term G(d,k). Recall that G{d,k) — J Q g{d,k 1 r)dr 1 where g(d,k,r) 



E 



loe 



§2 + Tf c =2 Z^^J j an d Z^ 2 ', • • • , are independent binary variables 



with expectation 1 — r each. For < j < d — 1, it holds that 



Pr 



c=2 



( d j X ) (1 - r*- 1 ^ (r^-^Cd - 1 - j) 



(18) 
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By the definition of expectation, it holds that 



d-i r d 



5 (d,fc,r) = ^log 2 (l + i)Pr Y. Z(C) 



= J 



j=0 Lc=2 



Combining this with (IT51) and integrating over r from to 1 yields the expressions 
Theorcm ll.il This finishes the proof. 

4 A Correlation Inequality 

The goal of this section to prove Lemma 13.41 We will prove a more general 
statement. 

Definition 4.1. A function f : {0, 1}" — > M. is called monotonically increasing, 
or simply monotone, if for all x,y £ {0, 1}™ it holds that 



where x < y is understood pointwise, i.e., Xi < i/i for all 1 < i < n. 

For example, the functions A and V, seen as functions from {0, 1}™ to R, are 
monotone, whereas the parity function is not. 

Definition 4.2. A function f : {0,1}" — >• K is called submodular if for all 
x,y £ {0, 1}, it holds that 



where V and A are understood pointwise, i.e. (xi, . . . , x n ) V (y%, . . . , y n ) = (xi V 
yi,...,x n Vy n ). 

Example. The OR-function / : (x\, . . . ,x n ) h4 Xi V • • • V x n is monotone and 
submodular: It is pretty clear that it is monotone, so let us try to show sub- 
modularity. There are two cases: First, suppose at least one of x and y is 0, say 
y = 0. Then the left-hand side of (l2Cfl) evaluates to f(x), and the right-hand side 
to /(0) + f(x) = f(x). If neither x = nor y = 0, then the left-hand side is 2, 
and the right-hand side is obviously at most 2. 

Example. The AND-function g : (xi, . . . , x n ) <— >■ x\ A • • ■ A x n is monotone, but 
not submodular. It is clearly monotone, so let us show that it is not submodu- 
lar. Consider n = 2. Set x = (0, 1) and y = (1,0). Then f{x) + f(y) = 0, but 
f(x Ay) + f(x V y) = /(0, 0) + /(l, 1) = 1. 

We define the notion of glued restrictions of functions. Let A, B be two ar- 
bitrary sets, and let / : A n — > B be a function. We define a new function /' by 



x < y => f{x) < f(y) 



(19) 



f(x) + f(y)>f(xAy) + f(x\Jz) , 



(20) 
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AAA 



X1X2XZX4X5XQX7 

Fig. 1. A 7-ary function / and a gluing restriction g. 



"gluing together" two input coordinates of /. Formally, for 1 < i < j < n, 
define the function 



we 



/' : (ai, . . .,o n ) >->■ /(ai, . 



-li Hi) • ■ • , On) • 



The function /' can be viewed as a restriction of / to inputs (at, . . . , a n ) for 
which a, = Oj. Thus, /' can be seen as a function A 11 ^ 1 — > B. We prefer, however, 
to define it as a function A n — > B that simply ignores the j th coordinate of its 
input. We say /' is obtained from / by a gluing step. A function g : A n — > B is a 
glued restriction of / if it can be obtained from / by a sequence of gluing steps. 
See Figure Q] for an intuition. 

Consider a function / : {0, 1}™ — > R and think of feeding / with random input 
bits. Formally, let X\, . . . ,X n be n independent binary random variables, each 
with expectation p. We are interested in the term E[/(Xi, . . . ,X n )]. In a sec- 
ond scenario, we introduce dependencies between the Xi by gluing some of them 
together: For example, instead of choosing X\,..., X n independently, we use the 
same bit for X%,X^, and X n , thus computing E[/(Xl, Xi,X^, X4, . . . , X n -x,Xi)] 
instead of E[/(Xi, . . . , X n )\. With the terminology introduced above, we want to 
compare E[/(Xt, . . . , X n )\ to ~E[g(Xi, . . . , X n )\, where g is a glued restriction of 
/. For general functions /, we cannot say anything about how E[/(Ai, . . . , X n )] 
compares to ~E[g(Xi, . . . , X n )]. However, if / is submodular, we can. 

To get an intuition, consider the boolean lattice {0, l} ra with at the bottom 
and 1 at the top. In that lattice, x A y is below x and y, and x V y is above 
them. Thus, in some sense, the points x and y lie between x Ay and x V y. 
See Figure [5] for an illustration. On the left-hand side of (p?0)) . we evaluate / at 
points that lie more to the middle of the lattice, whereas on the right-hand side 
we evaluate / at points that lie more to the bottom or top of it. The random 
vector (Xl, . . . ,X n ) tends to lie around the pn th level of the lattice, whereas 
(Xi,Xi,X3,Xi,...,X n -i,Xi) is less concentrated and more often visits the 
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Fig. 2. The boolean lattice with four points x, y, x A y and x V y. 



extremes of the lattice. In the light of (f^Uj) . we expect that biasing points towards 
the extremes will decrease E[/]. The following lemma formalizes this intuition. 

Lemma 4.3. Let f : {0, 1}™ — > K be a submodular function and g be a glued 
restriction of it. Let X\ , . . . , X n be independent binary random variables, each 
with expectation p. Then E[/(Xi, . . . ,X n )] > ~E[g(Xi, . . . ,X n )]. 

Proof. It is easy to see that applying a gluing step to a submodular function 
results in a submodular function: After all, a gluing step simply means restricting 
the function to a subset of its domain. Therefore, it suffices to prove the lemma 
for a function g that has been obtained from / by a single gluing step. Without 
loss of generality, we can assume that X n _\ and X n have been glued together. 
We have to show that 

V[f(X 1 ,...,X n )]>E[f(X 1) ...,X n _ 1) X n _ 1 )} . 

It suffices to show this inequality for every fixed (n — 2)-tuple of values for 
(Xi, . . . , X„_ 2 ). Formally for bi, . . . , b n ~2 G {0, 1}, let 

g : (x,y) i-J- /(6i, . . . ,b n - 2 ,x,y) . 

The function g is also submodular. Let X, Y be two independent binary ran- 
dom variables, each with expectation p. We have to show that E[g(X, Y)] > 
E\g(X,X)]. This is not difficult: 

E[<?(X, Y)] = (1 - pf ■ g(0, 0) + p(l - p) • 5 (1, 0) + 
+(l-p)p- 3 (0,l)+p 2 
= (1 - p) 2 • 5 (0, 0) + p(l - p) ■ (, 9 (1, 0) + 3 (0, 1)) + p 2 • 3(1, 1) 
> (1 -pf ■ 3(0,0) +p(l-p)- (3(0,0) +3(1, 1))+P 2 - 5(1,1) 
= ((1 - pf + p(l - p)) ■ 3(0, 0) + (p(l - p) + p 2 ) • 3(1, 1) 
= (1 - p) • 3(0, 0) + p • 3(1, 1) = E[9(X, X)] , 
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where the inequality comes from the submodularity of g. □ 

Lemma 4.4. Let I C M be an interval, and let f : {0,1}™ —> I be monotone 
and submodular, and h : I — > K be non- decreasing and concave. Then h o f : 
{0, 1}" — Y R is also monotone and submodular. 

Proof. It is clear that ho f ', being the composition of two monotone functions, is 
again monotone. To show submodularity, consider x,y £ {0,1}". Without loss 
of generality, f(x) < f(y). Using monotonicity, we see that 

fix A y) < fix) < f(y) < f{x V y) . 

Claim. If s < t arc in /, and a > b > are such that s — a € I and t + b € I, 
then h(s) + h(t) > h(s - a) + h(t + b). 
See Figure [3] for an illustration. To prove the claim, compare the line from 




h 



Fig. 3. A monotone concave function / and two line segments. 



(s, h(s)) to (t, h{t)) to the line from (s - a, h(s - a)) to (t + b, h(t + b)). The 
midpoints of those lines have the coordinates 

(s + t h(s) + h(t)\ fs-a + t + b h(s - a) + h(t + b)\ 

\ 2 ' 2 J an ^ 2 ' 2 J ' 

respectively. Since a >b, the hrst midpoint lies to the right of the second mid- 
point. Since both lines have positive slope (by monotonicity of h) and the first 
line lies above the second, we conclude that also the first midpoint lies above 
the second. Therefore (h(s — a) + h(t + b))/2 < (h(s) + h(t))/2, as claimed. 

We apply the above claim with s = f(x), t = f(y), a = f(x) — f(x A y) and 
b = f(x V y) — f{y). Note that s,t, s — a,t + b <E I and a, b > 0. To apply the 
claim we need that a > b, i.e., 



f(x)-f(xAy)>f(xVy)-f(y) , 
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which follows from submodularity. The claim implies that h(s) + h(t) > h(s — 
a) + h{t + b), which with these particular values of s,t,a, and b yields h(f(x)) + 
h(f(y))>h(f(xAy)) + h(f(xVy)). □ 

Proof (Proof of Lemma {K~J\ ). We define (d — £)(k— 1) random variables zj c ' for 
1 < 3 ' < k — 1 and I < c < d. These random variables are all independent and 
each has expectation 1 — r. We define the function / : {0, l}( d_£ )( fe_1 ) by 



(^+i) 



,4-i) = iog 2 [t+ E OR(4 c) v 



V x 



(C) N 



(21) 



This function is clearly monotone. We claim that it is submodular: The OR- 
function is submodular, and it is easy to check that a sum of submodular func- 
tions is again submodular. Finally, the function t H> \og 2 (£ + t) is concave. 
We apply Lemma 14.41 with the interval / = [0,oo), the submodular function 



J2t=£+i OR-t^f V • • • V a^i), which has domain /, and the concave function 
t \-> log 2 (£ + t). Thus / is submodular and monotone. To prove Lemma ET4l we 
have to show that 



E 



log 2 



E Y 



(c) 



7r(x) 



< E 



log 2 



E * 



(22) 



where the Z^> are independent binary random variables with expectation 1 



„fc-i 



and KM := OR(Y 1 (c) , 



.,n ( -i),with 



y(c) I 1 if yj c ^ comes after a; in the permutation 7r 



otherwise 
The left-hand side of (|2"21 thus reads as 
E[/(F^ +r 



for / as defined in (|2"Tj) . Since the Z( c ) are independent binary random variables 
with expectation 1 



„fc-i 



, their distribution is identical to the distribution of 



OR(z[ c \. . . , Z^\), and the right-hand side of is equal to 



E[/(^l 



7 (d) n 

' • > z fe-lJ 



We have to show that 
E[/0f +1) , 



■ ' 1 fe-i 



tt^) = r] < E[/(Z{ 



Z 



id) 1 
fc-lJ 



(23) 



Conditioned on 7r(x) = r, the distribution of each Yj is identical to that of Z 

(c) (c) 

but some are "glued together" , since the underlying variables of our CSP 

(c) 

formula need not be distinct. We can, however, assemble the into groups 
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according to their underlying variables such that (i) random variables from 

the same group have the same underlying and thus are identical, (ii) random 

variables from different groups are independent. Thus, f{Y} i+1 \ Yjf\ is a 

glued restriction of f(z[ e+l ', . . . , or rather can be coupled with a glued 

restriction thereof, and thus by Lemma 14. 3[ the expectation of the former is at 
most the expectation of the latter. Therefore holds. □ 
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